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Metabolic networks are known to be scale free but the evolutionary origin of this structural 
property is not clearly understood. One way of studying the dynamical process is to compare 
the metabolic networks of species that have arisen at different points in evolution and hence are 
related to each other to varying extents. We have compared the reaction sets of each metabolite 
across and within 15 groups of species. For a given pair of species and a given metabolite, the 
number Afc of reactions of the metabolite that appear in the metabolic network of only one 
species and not the other is a measure of the distance between the two networks. While Afc is 
small within groups of related species and large across groups, we find its probability distribution 
to be ~ (Afc) 1 where 7' is a universal exponent that is the same within and across groups. This 
exponent equals, upto statistical uncertainties, the exponent 7 in the scale free degree distribution 
~ fc 7 . We argue that this, as well as our finding that Afc is approximately linearly correlated 
with the degree fc of the metabolite, is evidence of a 'proportionate change' process in evolution. 
We also discuss some molecular mechanisms that might be responsible for such an evolutionary 
process. 



Metabolic networks are known to have a wide, scale 
free distribution of the degree of connectivity of their 
metabolites Q][2], P(k) ~ fc~ 7 (for a review see [3]). 
The exponent 7 has been found to have a value close to 
2.2 across all species of organisms that have been stud- 
ied pQ even though the sets of metabolites and reactions 
in metabolic networks vary quite substantially across 
organisms. This structure suggests that some univer- 
sal process is responsible for the evolution of metabolic 
networks; however at the present time the nature of this 
process is not clearly understood. A comparative study 
of the metabolic networks of organisms that are at vary- 
ing distances from each other on the evolutionary lad- 
der can shed light on this process. Here we report on 
such a study that reveals some universal features of this 
dynamical process. 

For growing networks, such as the internet, a prefer- 
ential attachment of new nodes to higher degree nodes 
0] as well as a proportionate change mechanism 
whereby nodes with higher degree experience propor- 
tionately higher changes in degree has been proposed 
to account for the scale free structure of the network. 
The latter process can lead to robust exponents [Hj. 
The metabolic network, however, is not a growing net- 
work like the internet; during the course of evolution 
the network has changed without a substantial change 

t Present address: Niels Bohr Institute for Astronomy, Physics 
* Corresponding author (jain@physics.du.ac.in) 



in the number of metabolites. Furthermore, so far no 
concrete evidence has been presented for a preferential 
attachment or proportionate change process during its 
evolution. Here we also present evidence for a propor- 
tionate change process in metabolic network evolution. 

We downloaded a database of metabolic networks 
of 107 organisms |7|. This contains organisms from all 
three kingdoms: eukaryotes, prokaryotes and archaea, 
arranged in 15 groups including animals, plants, fungi, 
proteobacteria, firmicutes, and others. Organisms in 
different groups are evolutionarily distant, while those 
in the same group are relatively closeby. We selected 
one species from each group (typically the one having 
the largest number of metabolites) and compared the 
metabolic networks of all 15 species pairwise (105 pairs 
of distant species). We also compared specific pairs of 
nearby species (within the same group). 

The metabolic network of a given species of organ- 
isms is the set of catalysed chemical reactions that can 
take place in the organism through which it converts 
'food molecules' into certain other types of molecules 
needed by the organism. The above database contains 
a list of 5275 metabolic reactions, each reaction char- 
acterized by its chemical equation and whether it is 
reversible or not. In particular, for every reaction the 
list of metabolites participating in the reaction (i.e., 
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the set of molecules that are reactants and products of 
the reaction) is available. For each reaction and species 
in the database, information is provided as to whether 
the reaction is present in the metabolic network of that 
species. The metabolic network of a given species typi- 
cally contains several hundred reactions and participat- 
ing metabolites. 

For a given pair of species, say A and B, consider 
the union M of the set of metabolites present in each 
metabolic network. For every metabolite in M, we now 
define Afc, a measure of the distance between the two 
networks. Consider a metabolite in M, and let Ra 
(Rb) be the set of reactions in the metabolic network 
of A (B) in which this metabolite participates. The 
number Ica (&b) of reactions in Ra (Rb) is the degree 
of the metabolite in the species A (B). Here we con- 
sider only the undirected degree of a metabolite, i.e., 
we do not distinguish whether the metabolite partici- 
pates as a reactant or a product. Reversible reactions 
(forward and reverse pair) in which a metabolite par- 
ticipates are treated as a single reaction for calculating 
its degree. If a metabolite occurs in only one of the 
species (say, A) and not the other (B), then Rb is the 
null set and ks = 0. 

Consider the reactions in RaCiRb- The reactions in 
Ra H Rb represent the links of the metabolite that are 
common to both species, and hence kAB, the size of this 
set, is a measure of how much the reaction set of this 
metabolite has remained 'conserved' in the evolution 
leading to species A and B from their last common an- 
cestor. Similarly, set (Ra^Rb)\(Ra^Rb), that is, the 
set of reactions in RaURb that are not in RaC\Rb, or 
equivalently those reactions of this metabolite that are 
in one network but not the other, measures the diver- 
gence between the two networks. The size of the latter 
set will be referred to as the divergence of the reaction 
sets of this metabolite between species A and B, and 
will be denoted Afc. Note that Afc is different from the 
magnitude of fc^ — ks- For example Ra and Rb can 
be different sets of reactions with the same number of 
reactions in which case fc^ — ks — while Afc ^ 0. Afc 
is a measure of the difference between the two networks 
that takes into account the identity of reactions and not 
just their number. 

We computed the degree distribution P(k) for each 
of the 15 organisms as well as the 'divergence probabil- 
ity distribution' Q(Ak) for each of the 105 pairs. By 
definition, for any pair (A,B), Q(Ak) = n(Ak)/\M\, 
where \M\ is the number of metabolites in M and 
n(Afc) is the number of metabolites in M whose di- 
vergence of reaction sets between A and B is equal to 
Afc. The divergence probability distribution for pairs 
of distant organisms is shown in Fig. 1 and com- 
pared with the degree distribution. The figure shows 
that Q(Ak) ~ (Afc)~ 7 with 7' = 7 upto statisti- 



cal uncertainties in both the exponents. That the de- 
gree distribution of two species follows the power law 
P(fc) ~ fc~ 7 is a statement of the present structure of 
the two metabolic networks. This in no way implies 
that Q(Afc) should also follow a power law with the 
same exponent. The latter is a distinct statement about 
the dynamical process that leads to the present struc- 
ture. That the Q(Ak) distribution has the same form 
for all 105 pairs of distant species considered reflects a 
universal property of the evolutionary process. 

A comparison of distant species reveals features of 
the evolutionary process over long time scales. In order 
to study the process over short time scales we compared 
nearby species (that were in the same group). The re- 
sult of three such comparisons is shown in Fig. 2. As 
expected, for each pair of nearby species the absolute 
divergence is smaller than for distant species. This is 
evident from the fact that the Q(Ak) curves are well 
below the P(k) curves in Fig. 2, in contrast to Fig. 1 
where they are much closer, and that larger values of 
Afc are absent in Fig. 2. However, it can be seen that 
the Q(Ak) still follows a power law with the same ex- 
ponent as before. This suggests that this feature of the 
evolutionary process is also valid over short evolution- 
ary time scales. 

We explored the relationship between Afc for a 
metabolite across a pair of species, and its degree in 
each of those species. In particular, one can ask for 
the conditional probability P(Ak\k) for a metabolite 
to have a reaction set divergence Afc across a pair of 
species, given that its average degree in the two species 
is fc. We found a positive and approximately linear 
correlation between Afc and the degree of a metabolite 
(Fig. 3). This is evidence of a 'proportionate change' 
type process [5] in the evolution of metabolic networks. 

This provides insight into why the exponents 7' and 
7 might be equal or very close. For, let us assume for 
the moment a perfect correlation between Afc and fc, 
i.e., P(Afc|fc) ~ S(Ak — /(fc)), or, equivalently, that 
Afc = /(fc) for some fixed one-to-one function /, and 
also that P(k) ~ fc -7 . Then the statement /(fc) ~ k a 
is equivalent to the statement Q(Ak) ~ (Afc) ' , with 
7' = j/ct. In particular a = 1 implies 7' = 7 and vice 
versa However this is not a complete explanation 
because, as is evident from Fig. 3, there is stochastic- 
ity in the relation between Afc and fc, and not perfect 
correlation. 

What kind of molecular process can give rise to this 
linear correlation between Afc and degree of a metabo- 
lite? A reaction is catalyzed by an enzyme to which 
the reactant molecules bind at specific sites in a 3- 
dimensional geometry. The structure of the enzyme 
is determined by the gene or genes that code for it, and 
genes evolve via several mechanisms, including random 
mutations and gene duplication followed by divergence 
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through independent mutations in both the copies. A 
metabolite with high degree binds to several enzymes 
that catalyze its reactions. If a gene corresponding to 
one of these enzymes mutates in a manner that dis- 
turbs the binding site of this metabolite on the enzyme, 
the corresponding reaction could be lost. The more 
enzymes the metabolite binds to, the proportionately 
higher is the probability of losing its reactions through 
random mutations. On the other hand if the gene du- 
plicates and diverges, that can introduce a new enzyme 
to which it binds and hence a new reaction for this 
metabolite to participate in. Large degree metabolites 
have a larger pool of interacting enzymes whose genes 
can duplicate, and hence if genes duplicate randomly, 
the number of new reactions a given metabolite partici- 
pates in is also expected to be positively correlated with 
its degree. Thus the same mechanisms, namely gene 
mutations and duplication- divergence, that have been 
considered as mechanisms for proportionate change and 
preferential attachment in protein interaction networks 
|10ll3*|. could operate for metabolic networks also. 
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Figure captions 

Fig. 1. The probability distribution of the divergence of reaction sets Q(Ak) for evolutionarily distant organisms 
compared with the degree distribution P(k) of those organisms. The blue curve (joining the blue triangles) in 
each figure gives the divergence distribution for the corresponding set of species. For that curve the x-axis of 
the figure represents Afc and the y-axis represents Q(Ak). The other curves give the degree distribution for 
those species. For these curves the x-axis represents k and the y-axis represents P(k). Both axes are on a 
logarithmic scale in all figures, k and Afc are binned logarithmically [H]. Figs, la-c compare species from the 
three kingdoms pairwise, namely the eukaryote Homo sapiens, the prokaryote Escherichia coli and the archaean 
Methanosarcina mazei. The P(k) curves for these organisms are given by red dots, green squares and brown 
hexagons, respectively, (a) Comparison between P(k) for H. sapiens (red dots), P(k) for E. coli (green squares) 
and Q{Ak) across these two species (blue triangles), (b) A similar comparison of H. sapiens and M. mazei. 
(c) A similar comparison of E. coli and M. mazei. (d) An average taken over 15 species each drawn from a 
different group in the database. For a given metabolite, the average k over the 15 species and the average Ak 
across the 105 pairs of species are computed and binned logarithmically. The cyan rhombuses represent P{k) 
and blue triangles Q(Ak) in Fig. Id. The blue lines in Figs, la-d are consistent with Q(Ak) ~ (Afc)~ 7 . The 
least square fit value of the slope 7' ± standard error arising from the scatter of the points plotted in the figures 
is (a) 2.16 ± 0.13, (b) 2.21 ± 0.10, (c) 2.23 ± 0.08, (d) 2.13 ± 0.13. The values of the exponent in P(k) ~ k~~< 
are 7 = 2.21 ± 0.09 (red), 2.19 ± 0.11 (green), 2.18 ± 0.12 (brown), and 2.07 ± 0.11 (cyan). For each of the 15 
organisms considered, 7 ranges from 2.09 to 2.21 with mean ± standard deviation = 2.16 ± 0.05, while for each 
of the 105 pairs 7' ranges from 2.09 to 2.37 with a mean of 2.17 ± 0.04. 

Fig. 2. Q(Ak) and P(k) for evolutionarily closeby species within the same group. Conventions are the same as 
for Fig. la-c, except that the individual species are different. Though Ak values are smaller for closeby species 
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as compared to distant species, Q(Ak) nevertheless seems consistent with a power law with the same exponent 
as P(k). Fig. 2a compares two eukaryotes, both yeasts, Saccharomyces cerevisiae (7 = 2.09 ± 0.10, green), and 
Schizosaccharomyces pombe (7 = 2.18 ± 0.14, pink), and 7' is found to be 2.28 ± 0.10 (blue). Fig. 2b compares 
two prokaryotes, both proteobacteria, E. coli (2.19±0.11, green) and Salmonella typhimurium (2.17±0.11, pink); 
7' = 2.18 ±0.11 (blue). Fig. 2c compares two archaea, Pyrococcus horikoshi (2.25 ±0.10, green) and Pyrococcus 
furiosus (2.17 ± 0.09, pink); 7' = 2.28 ± 0.16 (blue). 

Fig. 3. Fig. 3. Positive and approximately linear correlation between the divergence of the reaction set of a 
metabolite and the degree of the metabolite, (a) Scatter plot (on a linear scale) of the average Ak of a metabolite 
across the 105 pairs of species versus its average degree across the 15 (distant) species. The lone point on the 
extreme right is a single highly connected metabolite, the hydrogen ion. The plot appears approximately linear 
with some stochasticity. (b) The same on a logarithmic scale where metabolites are placed in logarithmic bins 
according to their average degree, and the average Ak for a bin is computed by averaging over all 105 pairs 
of organisms for a given metabolite and then averaging over all metabolites in the bin. The slope of the least 
square fitted straight line ± the standard error of the deviation of points in the figure from the fit is 1.08 ± 0.03. 
(c). Afc versus degree of a metabolite for three pairs of distant species. The three pairs of species chosen are 
the same as in Figs. la-c. For each pair of species (A, B), the x-axis represents fc max = max(fc J 4, ks) (the larger 
of the two degrees of the metabolite in the two species). The slopes of the three lines are 1.09 ± 0.03 (green) 
for H. sapiens and E. coli; 1.08 ± 0.02 (pink) for H. sapiens and M. mazei; and 1.03 ± 0.02 (cyan) for E. coli 
and M. mazei. (d) Afc versus /c max of a metabolite for three pairs of closeby species. The three pairs of species 
chosen are the same as in Figs. 2a-c. These have a larger scatter than distant species. The slopes of best fit 
lines are 1.03 ± 0.07 (green) for S. cerevisiae and S. pombe; 0.97 ± 0.12 (pink) fori?, coli and S. typhimurium; 
and 1.07 ± 0.08 for P. horikoshi and P. furiosus. 
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